In this project, we aim to present a interactive display of obesity rates in adults across the different states in the U.S., and more pertinently, analyze how differences in the level of nutritional intake across the different states correlate with obesity rates. In addition, we also analyze the differences in the level of physical activities across the different states. Since it is widely known that one’s diet and physical exercises play a key role in affecting one’s weight and health, we believe that states with a widespread number of fast food outlets see greater occurrence of obesity as the presence of such unhealthy food options fuel adults living in these to adopt unhealthy diets, thereby neglecting their weight and health. In particular, our analysis consist of three major components:
From the graph, the total obesity rate increases from 27.4% to 30.1%. Moreover, the overall male obesity rate is higher than female.
People with higher degree tend to have lower obesity rate, and the obesity rate of people with college degree is the lowest.
This graph shows that middle-age groups from 35 to 64 tend to have the highest obesity rate, while the young adults with age from 18 to 24 have the lowest obesity rate.
This graph shows people with higher income tend to have lower obesity rate.
West Virginia has the highest obesity rate with the value 38.1%, while Colorado has the lowest with the value 22.6%
Vermont has the highest physical activity rate with the value 59.7%, while Puerto Rico has the lowest with the value 19.6%
There are 676 fast food restaurants in California.
The most common fast food restaurants are McDonald’s, Burger King,and Taco Bell.
To understand the distribution of fastfood restaurants in the US, we visualized the random sample of 10,000 fast food restaurants from the Datafiniti dataset. The graph focus on the contiguous U.S. land and shows a disporportional dense distribution of fastfood restaurant in the East and West coast than Mid America, which is understandable as coastal area has higher population density. We are interested in looking for difference between distribution among top popular restaurants.
This plot shows McDonal’s is popular everywhere while Burger King is more popular in the north, and Taco Bell is more popular in the Mid America area with density higher in states like Illionois, Indiana, Ohio and Wisconsin.
We would like to examine the first year available (2011) data on the Obesity rate across the U.S.
The obesity rates was splited into under 25%, 25-30% and 30% up groups. In 2011, 9 states are in the green shade, meaning their obesity rate was under 25%.
However, in 2017, the situation has changed dramatically. Only one state left on the map has less than 25% obesity rate. The map is presented below.
Therefore We would like to examine the changes over the past period.
The grid shows the spread of obesity throughout the past 6 years. The red-shaded and blue-shaded states gradually increase and took over states with low obesity rate. In 2013, there are 6 green states left, in 2015, there is only 5. And the 2017 shows only colorado Stands as the only state with Obesity rate under 25% at 22.6%.
To further explore the relationship, we mapped the distribution of fastfood restaruant and State obesity level in 2017
By clustering the distribution of fastfood restaurants, we see that states with 30% and up obesity rate, indicated by the red shade, do have heavier concentration of fastfood restaurant in Middle and East America. However, high obesity state like Alaska does not have as much Fastfood Restaurant.
The finding lead to our stronger interest in explore the fastfood restaurant with other behavior. We would like to explore fastfood restaurant distribution and physical activity of the state residents. In the following graphs, the fastfood restaurant are visualized as the orange dots on the map.
The fastfood and exercise map shows the states in Mid/south U.S has lower activity rate, and they are located in dense fastfood restaurant area as well. Northeastern US has activity rate around 50-55%, and the fast food restaurant distribution is also dense. the existance of fastfood does not decrease people exercise rates. The west coast has higher activity rate and they have lower distribution of fastfood restaurant. Alaska has high activity rate and low fastfood restaurant distribution. The only state has activity rate under 35% is Puerto Rico, but the data set does not have infomation regarding the fastfood restaurant here.
We would further explore whether fastfood restaurant distribution is closely related to state resident nutritional intake. We mapped the vegetable intake and the fastfood distribution.
The vegetable intake graph shows the upper northern states have less than 15 percent people eating less veggie than once daily. All east coast except New York have only 15-20 percent resident eating less than once veggie. Whereas the south west US is has around 25-30 percent people eating vegetable less than once daily. Interestingly, Puerto Rico has more than 35 percent people eating vegetable less than one a day. Alaska on the other hand, is blue-shaded with 19 percemt people eating veggie less than once every day.
We collected text data from twitter related to keywords like “fastfood”, “obesity” and “diet” that is relevant to our project. Since “obesity” has negative connation, querying for the term “obesity” might have skewed our results to be more negative.
The barchart indicates that states like Texas, South Carolina, Montana and California tweet positively about fastfood, while states like Arizona, Colorado and Michigan tweet negatively about fastfood.
The plot of fastfood related words with the highest frequency are words related to actual food sold in fastfood chains, such as “taco”, “whooper” and “sandwich”. This supports the previous finding that some of the most popular fastfood chains in the U.S. are Burger King and Taco Bell. In addition, some of the words such as Whopper(R) seem to be coming from tweets by restaurants themselves, and it could be that the states with positive sentiments towards fastfoood are merely being targeted by food advertising more.
For tweets related to obesity and diet, the word cloud shows words related to health (such as workout, body and weight), diet (such as vegan, coke and keto) and obesity related diseases such as diabetes. This suggests that people who talk about obesity and diet related topics are generally concerned about obesity and seek to improve their lifestyle and diets as obesity prevention efforts.
The scatterplot plot the relationship between obesity rate of each of the U.S. state and the overall sentiment of the twitter text related to obesity. One would assume a positive relationship between obesity rate and sentiment towards obesity, such that states with higher occurrence of obesity have tweets that display a more positive sentiment towards obesity. The lexicon classifications we used is Bing. From the plot, we do not see a clear linear relationship between obesity rate and text sentiment. Instead, the plot suggests a quadratic relationship between obesity rate and text sentiment, such that states with the highest obesity rates display the more extreme sentiments, both positive and negative, toward obesity, as compared to states with low obesity rates.
The second wordcloud shows all the negative words related to obesity and diet related tweets. It seems to be that people who tweet about obesity and diet are most concerned with the adverse health complications associated with obesity. The words with the highest frequencies include kill, risk, death and cancer etc which are all extremely negative words related to health issues due to obesity.
Next, we explore if there are differences in the words that people from states with high and low obesity rates tweet. The graph of the log odds ratio of tweets from states with high obesity rate over states with low obesity rate reveals that people from states with high obesity rates are more likely to tweet highly extreme and negative words associated with obesity as compared to people from states with low obesity rate. This include words such as “risk”, “disease” and “death”. In comparison, people from states with low obesity rate tend to tweet content related to adoption of healthy lifestyles and diet as compared to people from states with high obesity rate. This include words like “read”, “calorie” and “nutrient”. This indicates that people from states with low obesity rates are more concerned with maintaining their health and weight as compared to people from states with high obesity rates.
Lastly, we visualized the network of bigrams in obesity related tweets. As expected, the network graph has two main clusters around “obesity” and “diet”. The bigrams related to obesity are mostly negative words. Some bigrams reflects the rising obesity rate in the U.S., such as “obesity prevalence”, “rampant obesity”, and others reflect the diseases associated with obesity such as “childhood obesity”, and “morbid obesity”. The diet related words are more positive, and include bigrams related to physical exercise and maintaining a healthy body image.